Optimizing large collections of continuous content-based RSS aggregation queries
نویسندگان
چکیده
In this article we present RoSeS (Really Open Simple and Efficient Syndication), a generic framework for content-based RSS feed querying and aggregation. RoSeS is based on a data-centric approach, using a combination of standard database concepts like declarative query languages, views and multi-query optimization. Users create personalized feeds by defining and composing content-based filtering and aggregation queries on collections of RSS feeds. Publishing these queries corresponds to defining views which can then be used for building new queries / feeds. This naturally reflects the publish-subscribe nature of RSS applications. The contributions presented in this article are a declarative RSS feed aggregation language, an extensible stream algebra for building efficient continuous multiquery execution plans for RSS aggregation views, a multi-query optimization strategy for these plans and a running prototype based on a multi-threaded asynchronous execution
منابع مشابه
RoSeS: A Continuous Content-Based Query Engine for RSS Feeds
In this article we present RoSeS (Really Open Simple and Efficient Syndication), a generic framework for content-based RSS feed querying and aggregation. RoSeS is based on a data-centric approach, using a combination of standard database concepts like declarative query languages, views and multiquery optimization. Users create personalized feeds by defining and composing content-based filtering...
متن کاملBest-Effort Refresh Strategies for Content-Based RSS Feed Aggregation
During the past several years RSS-based content syndication has become a standard technique for efficiently and timely disseminating information on the web. From a data processing perspective RSS feeds are standard XML resources which are periodically refreshed by feed aggregators for generating continuous streams of items. In this article, we study the problem of information loss in the contex...
متن کاملDescribeX: A Framework for Exploring and Querying XML Web Collections
DescribeX: A Framework for Exploring and Querying XML Web Collections Flavio Rizzolo Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2008 The nature of semistructured data in web collections is evolving. Even when XML web documents are valid with regard to a schema, the actual structure of such documents exhibits significant variations across collections for s...
متن کاملOptimAX: optimizing distributed continuous queries
1 Setting Fulfilling the vision of a decentralized Web of peers requires efficient mechanisms for decentralized dissemination of information. RSS feeds are part of this vision: incremental updates to XML documents are pushed from a given producer to a set of subscribers along known paths. In this work, we envision processing continuous XML queries. Such queries are expressed in some XML query l...
متن کاملمرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشهبندی
With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...
متن کامل